Objective and subjective feature evaluation for speaker-adaptive visual speech synthesis
نویسندگان
چکیده
This paper describes an evaluation of a feature extraction method for visual speech synthesis that is suitable for speaker-adaptive training of a Hidden Semi-Markov Model (HSMM)-based visual speech synthesizer. An audio-visual corpus from three speakers was recorded. While the features used for the auditory modality are well understood, we propose to use a standard Principal Component Analysis (PCA) approach to extract suitable features for training and synthesis of the visual modality. A PCA-based approach provides dimensionality reduction and component de-correlation on the 3D facial marker data which was recorded using a facial motion capturing system. Enabling visual average “voice” training and speaker-adaptation brings a key strength of the HMM framework into both the visual and the audio-visual domain. An objective evaluation based on reconstruction error calculations, as well as a perceptual evaluation with 40 test subjects, show that PCA is well suited for feature extraction from multiple speakers, even in a challenging adaptation scenario where no data from the target speaker is available during PCA.
منابع مشابه
Objective evaluation measures for speaker-adaptive HMM-TTS systems
This paper investigates using objective quality measures to evaluate speaker adaptation performance in HMM-based speech synthesis. We compare several objective measures to subjective evaluation results from our earlier work about 1) comparison of speaker adaptation methods for child voices and 2) effects of noise in speaker adaptation. The results analysed in this work indicate a reasonable cor...
متن کاملPerformance evaluation of the speaker-independent HMM-based speech synthesis system "HTS 2007" for the Blizzard Challenge 2007
This paper describes a speaker-independent/adaptive HMM-based speech synthesis system developed for the Blizzard Challenge 2007. The new system, named “HTS-2007”, employs speaker adaptation (CSMAPLR+MAP), feature-space adaptive training, mixed-gender modeling, and full-covariance modeling using CSMAPLR transforms, in addition to several other techniques that have proved effective in our previou...
متن کاملSpeaker-adaptive visual speech synthesis in the HMM-framework
In this paper we apply speaker-adaptive and speakerdependent training of hidden Markov models (HMMs) to visual speech synthesis. In speaker-dependent training we use data from one speaker to train a visual and acoustic HMM. In speaker-adaptive training, first a visual background model (average voice) from multiple speakers is trained. This background model is then adapted to a new target speake...
متن کاملLost Speech Reconstruction Method usin Missing Feature Theory and HMM
In recent years, IP telephone service has spread rapidly. However, an unavoidable problem of IP telephone service is deterioration of speech due to packet loss, which often occurs on wireless networks. To overcome this problem, we propose a novel lost speech reconstruction method using speech recognition based on Missing Feature Theory and HMM-based speech synthesis. The proposed method uses li...
متن کاملAutomatic feature selection for acoustic-visual concatenative speech synthesis: towards a perceptual objective measure
We present an iterative algorithm for automatic feature selection and weight tuning of target cost in the context of unit selection based audio-visual speech synthesis. We perform feature selection and weight tuning for a given unit-selection corpus to make the ranking given by the target cost function consistent with the ordering given by an objective dissimilarity measure. We explicitly perfo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013